Statistically trained orthographic to sound

نویسندگان

  • Ananlada Chotimongkol
  • Alan W Black
چکیده

Many languages have a non-obvious, but not unrelated, relationship between orthography and pronunciation. Traditional methods for automatic conversion from letters to phones involve hand-crafted letter-to-sound rules, but these require care and expertise to develop. This paper presents a letter-to-sound rule system for Thai, that is trained automatically from lexicons. A statistical model, decision trees, is used to predict phones from letters. Letters mappping to multi-phones are used to solve the problem of implicit vowels and final consonants propagation and preand post-processing techniques are used to handle the inversion of initial consonants and vowels. For tone prediction, hand-crafted rules are used instead since there is no ambiguation if the phonological composition is known. Combining the n-gram of phone model with the decision trees, we can achieve 68.76% word accuracy which is better than 65.15% word accuracy in the rule-based approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of L1 Persian on the Acquisition of English L2 Orthographic System on the Shared Grounds

This paper elaborates on Persian and English orthographic shared aspects to study the effects of L1 Persian on learning English as a foreign language. While there are some examples of letter and sound mismatches in the orthographic system of both languages, those of English are more complex than Persian. In order to see the effect of the mismatch between orthography and transcription, 40 Persia...

متن کامل

Orthographic processing in baboons (Papio papio).

Skilled readers use information about which letters are where in a word (orthographic information) in order to access the sounds and meanings of printed words. We asked whether efficient processing of orthographic information could be achieved in the absence of prior language knowledge. To do so, we trained baboons to discriminate English words from nonsense combinations of letters that resembl...

متن کامل

Non-native production training with an acoustic model and orthographic or transcription cues

The perception and production of non-native speech sounds is the key to learning a new language. The differences between the native and the target language sound systems cause learning problems, but orthographic conventions may also affect the learning process. We tested whether a misleading orthography in contrast to phonemic transcription affects the manner in which native Finns learn to prod...

متن کامل

Orthography influences the perception and production of speech.

One intriguing question in language research concerns the extent to which orthographic information impacts on spoken word processing. Previous research has faced a number of methodological difficulties and has not reached a definitive conclusion. Our research addresses these difficulties by capitalizing on recent developments in the area of word learning. Participants were trained to criterion ...

متن کامل

The influence of consistency, frequency, and semantics on learning to read: an artificial orthography paradigm.

Two experiments explored learning, generalization, and the influence of semantics on orthographic processing in an artificial language. In Experiment 1, 16 adults learned to read 36 novel words written in novel characters. Posttraining, participants discriminated trained from untrained items and generalized to novel items, demonstrating extraction of individual character sounds. Frequency and c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000